Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small scan-handler fixes #16721

Merged

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Sep 2, 2024

Description

Reject two more edge cases that we do not support.

We could easily support the case where the parquet read just needs to read the metadata, but it is low priority, so have not done so here.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

We don't yet expose the metadata in pylibcudf, so we can't handle this
correctly.
@wence- wence- added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 2, 2024
@wence- wence- requested a review from a team as a code owner September 2, 2024 16:57
@wence- wence- requested review from bdice and Matt711 and removed request for a team September 2, 2024 16:57
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.polars Issues specific to cudf.polars labels Sep 2, 2024
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a small question - it may have already been considered.

@@ -222,6 +222,8 @@ def __post_init__(self) -> None:
raise NotImplementedError(
"Read from cloud storage"
) # pragma: no cover; no test yet
if any(p.startswith("https://") for p in self.paths):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about http:// or other protocols?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way this occurs is because the user wrote scan_foo("hf://some_path/") and it is expanded before we see it into https://huggingface..../some_path.

So I could tighten up to match the full path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a broader brush to catch other URL-like things too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine. If no https:// URLs are supported then let’s not make it specific to HuggingFace.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving assuming Bradley is happy (seems like he is based on the open thread and the pre-approval).

@bdice
Copy link
Contributor

bdice commented Sep 3, 2024

Yes, I’m happy.

@wence- wence- merged commit c76e90b into rapidsai:feature/cudf-polars Sep 4, 2024
85 of 86 checks passed
@wence- wence- deleted the wence/fea/polars-scan-fixes branch September 4, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.polars Issues specific to cudf.polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants